Skip to content

Display tables from other clients as formatted text#6201

Open
olivierlambert wants to merge 1 commit intoelement-hq:developfrom
olivierlambert:feature/render-markdown-tables-as-text
Open

Display tables from other clients as formatted text#6201
olivierlambert wants to merge 1 commit intoelement-hq:developfrom
olivierlambert:feature/render-markdown-tables-as-text

Conversation

@olivierlambert
Copy link

Summary

Markdown tables sent from other clients (e.g. Element Web) arrive as HTML <table> elements in the formatted_body. Currently, the wysiwyg library (io.element.android:wysiwyg v2.41.1) silently strips these — leaving only flattened text with no structure whatsoever.

This PR pre-processes tables into <pre><code> blocks containing a pipe-based text representation:

Header A | Header B
---------+---------
Cell 1   | Cell 2
Cell 3   | Cell 4

Why this approach

The core constraint: the wysiwyg Safelist

HtmlToDomParser.document() calls Jsoup.clean() with a hardcoded Safelist that only allows: a, b, strong, i, em, u, del, code, ul, ol, li, pre, blockquote, p, br. All <table>, <thead>, <tbody>, <tr>, <td>, <th> tags are stripped before the DOM is even constructed. This means any table-aware processing must happen before HtmlToDomParser.document() runs — not after.

Why <pre><code> as the output format

We need a replacement that:

  1. Survives the wysiwyg Safelist (both <pre> and <code> are allowed)
  2. Is already rendered well by the wysiwyg library (styled monospace code block with background)
  3. Preserves the visual structure of tabular data (alignment requires a monospace font)

<pre><code> satisfies all three. The wysiwyg library already renders these as styled code blocks with a monospace font, which is exactly what pipe-formatted tables need for proper column alignment.

Why pre-process the HTML string rather than the DOM

The initial implementation called dom.convertTablesToText() after HtmlToDomParser.document(). This didn't work because Jsoup.clean() (inside HtmlToDomParser) had already stripped all table tags. The fix is to:

  1. Parse the raw HTML with Jsoup.parse() (no safelist)
  2. Convert <table> elements to <pre><code> in that DOM
  3. Serialize back to HTML via doc.body().html()
  4. Pass the processed HTML to HtmlToDomParser.document()

A fast-path check ("<table" !in html) avoids the extra parse for the vast majority of messages that contain no tables.

Separator style: -+- vs |

The separator line uses -+- as the column joiner (e.g. ------+------), which visually aligns the + with the | in data rows. This is intentional: in a monospace font, the + sits exactly under each |, giving a clean grid appearance.

Header detection heuristic

  1. If <thead> exists → its rows are headers (separator placed after them)
  2. Otherwise, if the first <tr> contains only <th> elements → treated as a single header row
  3. Otherwise → no header, no separator line

This covers the two common patterns: explicit <thead>/<tbody> structure (Element Web) and simple <th>-first-row tables.

Edge cases handled

  • Empty tables → removed from the DOM (no crash, no empty code block)
  • Unequal column counts → shorter rows padded with empty cells to the max column count
  • Nested tables.text() on cells naturally flattens nested content
  • Single-column tables → rendered as plain lines (no pipes needed since joinToString(" | ") on a single element produces no separator)
  • Cell whitespace → trimmed via element.text().trim()
  • Multiple tables → each converted independently (list is snapshotted before iteration to avoid concurrent modification)
  • jsoup auto-wraps <tbody> → jsoup always wraps bare <tr> elements in a <tbody> during parsing; the extraction logic handles this correctly through the tbody != null branch

Files changed

  • New: HtmlTableToText.kt — standalone Document.convertTablesToText() extension
  • Modified: ToHtmlDocument.kt — pre-processes raw HTML before HtmlToDomParser.document()
  • New: HtmlTableToTextTest.kt — 10 unit tests (simple table, thead, th detection, unequal cols, empty, surrounding content, multiple tables, single column, whitespace, integration)
  • Modified: ToPlainTextTest.kt — 1 additional test for the plain-text pipeline

Limitations and future considerations

  • No colspan/rowspan support — cells spanning multiple columns or rows are treated as single cells. This could be improved but adds significant complexity for a rare case.
  • The toPlainText() path collapses formattingPlainTextNodeVisitor uses TextNode.text() which normalizes whitespace, so the pipe table loses its newlines in plain-text output. The primary rendering path (HTML → wysiwyg) preserves formatting correctly.
  • Ideally the wysiwyg library would support tables natively — this is a pragmatic workaround. If the wysiwyg library adds table support in the future, this pre-processing can be removed.

Test plan

  • HtmlTableToTextTest — 10 tests all passing
  • ToPlainTextTest — all tests passing including the new table test
  • ToHtmlDocumentTest — all existing tests still passing (no regression)
  • Manual testing: send a table from Element Web, verify it renders as a code block with aligned columns on Android

🤖 Generated with Claude Code

The wysiwyg library (io.element.android:wysiwyg) does not support
<table> tags — its Safelist strips them during Jsoup.clean(), leaving
only flattened text with no structure.

This pre-processes the raw HTML *before* it reaches the wysiwyg
parser, replacing <table> elements with <pre><code> blocks containing
a pipe-based text representation that the wysiwyg library already
renders as styled code blocks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@olivierlambert olivierlambert requested a review from a team as a code owner February 13, 2026 12:31
@olivierlambert olivierlambert requested review from ganfra and removed request for a team February 13, 2026 12:31
@github-actions
Copy link
Contributor

Thank you for your contribution! Here are a few things to check in the PR to ensure it's reviewed as quickly as possible:

  • Your branch should be based on origin/develop, at least when it was created.
  • The title of the PR will be used for release notes, so it needs to describe the change visible to the user.
  • The test pass locally running ./gradlew test.
  • The code quality check suite pass locally running ./gradlew runQualityChecks.
  • If you modified anything related to the UI, including previews, you'll have to run the Record screenshots GH action in your forked repo: that will generate compatible new screenshots. However, given Github Actions limitations, it will prevent the CI from running temporarily, until you upload a new commit after that one. To do so, just pull the latest changes and push an empty commit.

@CLAassistant
Copy link

CLAassistant commented Feb 13, 2026

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added the Z-Community-PR Issue is solved by a community member's PR label Feb 13, 2026
@olivierlambert olivierlambert changed the title Render HTML tables as pipe-formatted text in code blocks Display tables from other clients as formatted text Feb 13, 2026
@olivierlambert
Copy link
Author

FYI, my goal is to provide a first idea/possibility to solve the fact tables aren't displayed correctly on Element-X (on my Android phone). I'm not experienced enough myself to provide a solution, it was done via Claude Code.

I would understand if you are not willing to merge this, I'm simply hopeful it could bring some ideas or make things easier to solve that functional limitation. I tried my best that the result fits with the existing tests and code base.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Z-Community-PR Issue is solved by a community member's PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants